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Abstract 
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I.  DTTRODUCTIOB 

Reliability  and  the  ability  to  recover  from  topological  changes  are 
properties  of  utmost  importance  for  smooth  operation  of  data-coounicatlon 

i 

networks.  In  today's  data  networks  it  happens  occasionally,  more  or  less 
often  depending  on  the  quality  of  the  individual  devices,  that  nodes  and 
eonuni cation  links  fail  and  recover;  also  new  codes  or  links  become 
operational  and  have  to  be  added  to  an  already  operating  network.  The 

* 

% 

reliability  of  a computer-communication  network, in  the  eyes  of'  its  users , 
depends  on  its  ability  to  cope  with  these  changes,  meaning  that  no  breakdown 
of  the  entire  network  or  of  large  portions  of  it  will  be  triggered  by  such 
changes  and  that  in  finite  - and  hopefully  short  - tine  after  their  occur- 
rence, the  remaining  network  will  be  able  to  operate  normally.  Unfortunately, 
recovery  of  the  network  under  all  conditions,  namely  after  arbitrary  number, 
timing  and  location  of  topological  changes  is  a property  that  is  very  hard 
to  insure  and  little  successful  analytical  work  has  been  done  in  this  direc- 
tion so  far. 

The  above  reliability  and  recovery  problems  are  difficult  whether  one 
uses  centralized  or  distributed  routing  control.  With  centralized  routing,  one 
has  the  problem  of  control  node  failure  plus  the  chicken  and  egg  prohlem  of 
needing  routes  to  obtain  the  network  information  required  to  establish  routes. 
Our  primary  concern  here  is  with  distributed  routing;  here  one  has  the  problems 
of  asynchronous  computation  of  distributed  status  information  and  of  designing 
algorithms  which  adapt  to  arbitrary  changes  in  network  topology  In  the  absence 
of  global  knowledge  of  topology.  In  previous  works  [1],  [2],  minimum  delay 
routing  procedures  using  distributed  confutation  were  developed.  If  the  topo- 
logy of  the  network  is  fixed,  these  algorithms  maintain  a loop-free  routing  at 
each  step,  and  if  furthermore  the  input  traffic  requirements  are  stationary, 
the  algorithms  bring  the  network  to  the  minimum  delay  routing. 
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The  basic  algorithm  in  this  paper  is  similar  to,  but  somewhat 
simpler  than  the  algorithms  of  [l],  [2].  Here  ve  do  tot  seek  optimality, 
but  are  still  interested  in  maintaining  a loop-free  distributed  adaptive 

a 

routing.  One  of  the  main  contributions  of  the  algorithm  given  in  the  pre- 
sent paper  is  to  introduce  features  insuring  recovery  of  the  network  from 
arbitrary  topological  changes.  As  such,  the  resulting  algorithm  to  be  pre- 

f 

sented  in  Section  II  is, to  our  knowledge,  the  first  one  for  vhich  all  of 
the  following  properties  hold  and  are  rigorously  proved: 

(a)  Distributed  computation. 

(b)  Loop-freedom  for  each  destination  at  all  times. 

(c)  Independently  of  the  sequence,  location  and  quantity  of  topological 
changes,  the  network  recovers  in  finite  time. 

The  algorithm  provides  a protocol  using  distributed  computation  for 
building  routing  tables.  For  each  destination,  the  routing  table  at  each 
node  1 consists  of  a preferred  neighbor  and  an  estimated  minimum 

distance  d^  to  the  destination  (the  distance  is  measured  in  terms  of  pos- 
sibly time-varying  link  weights).  Property  (b)  above  means  that  the  links 
(i,p^)  never  form  a loop.  According  to  property  (c),  the  algorithm  insures 
that  a finite  time  after  (arbitrary)  topological  changes  happen,  all  nodes 
physically  connected  to  the  destination  will  form  a single  tree  defined  by 
the  relationship  (i,p.)  with  the  root  at  the  destination.  These  properties 
are  stated  formally  in  Section  III  and  rigorously  proved  in  [11]. 

We  may  also  note  that,  since  we  are  concerned  here  only  with  bui-lding  rout- 
ing tables,  the  algorithm  can  be  used  in  (actual  or  virtual)  line  switching, 
as  well  as  in  message  or  packet  switching  or  any  combination  thereof.  Finally, 
the  algorithm  has  the  property  of  not  employing  any  time-out  In  its  operation, 
a feature  which  greacly  enhances  its  amenability  to  analysis. 

<&€$  A 1 


In  addition  to  the  introduction  of  the  algorithm  %md  tlufriproofs  of 
its  sain  properties,  the  paper  provides  contributions  in  the  direction  of 
node-ling,  analysis  and  validation  of  distributed  algoritha*.  The  operations 
required  by  the  algorithm  at  each  node  are  sxonarixed  as  a finite-state 
machine,  with  transitions  betveen  states  triggered  by  the  arrived  of  updat- 
ing messages.  During  the  activity  of  the  algorithm,  messages  arm  sent 
between  neighbors,  queued  at  the  receiving  node  and  then  processed  on  a first- 
cona  first- served  basis.  The  processing  of  a message  consists  of  its 
temporary  storage  in  an  appropriate  mesory  location,  followed  by  activation 
of  the  finite-state  machine,  vhich  tabes  the  necessary  actions  sad  performs 
the  corresponding  state  transitions.  In  addition  to  its  state,  each  node 
has  a set  of  memory  items  (i.e.  variables)  that  are  used  a*  "context"  in  the 
execution  of  the  finite-state  machine.  Thus,  predicates  on  the  value  of 
those  variables  can  be  used  as  conditions  for  transitions  to  occur,  and  the 
occurrence  of  transitions  may  change  the  value  of  the  variables. 

Methods  for  modeling  and  validation  of  various  coasaini cation  proto- 
cols are  proposed  in  [3]  - [6].  These  methods  are  designed  however  to  handle 
protocols  involving  either  only  tvo  communicating  entities  or  nodes  connected 
by  a fixed  topology.  The  model  ve  use  to  describe  our  algoritha  is  a com- 
bination of  these  knovn  models,  but  is  extended  to  allow  us  to  study  s fairly 
complex  distributed  protocol,  where  arbitrary  failures  and  added  links  and 
nodes  cause  topological  changes . The  analysis  and  validation  of  the  algoritha 
is  performed  by  using  a special  type  of  induction  to  be  described  in  Section  III 
that  allows  us  to  prove  in  [11]  global  properties  while  essentially  looking  at 

local  events. 

Several  routing  algorithms  possessing  some  of  the  properties  indicated 
above  have  been  previously  indicated  in  the  literature.  In  [9],  a routing 
algoritha  similar  to  the  one  used  in  the  AKPA  network,  but  with  unity  link 
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weight*,  is  presented.  It  is  shown  that  at  the  time  the  algorithm  terminates, 

Che  resulting  routing  procedure  is  loop-free  and  provides  the  shprtes e-paths 

to  each  destination.  As  with  the  ARP A routing,  however,  the  algorithm  allows 

temporary  loops  to  be  formed  during  the  evolution  of  the  algorithm.  The  algor- 

% 

ltha  proposed  in  [10]  ensures  loop-free  routing  for  individual  messages.  This 
property  is  achieved  by  requesting  each  node  to  send  a probing  message  to  the 
destination  before  each  individual  rerouting;  the  node  is  allowed  to  indeed 
perform  the  rerouting  only  after  receiving  an  acknowledgement  from  the  destination. 
Loop  freedom  for  Individual  messages  is  a weaker  property  than  loop  freedom  for 
each  destination.  For  example,  in  a three-node  network,  sending  traffic  from 
node  3 to  node  1 via  node  2 and  sending  traffic  from  node  2 to  node  1 via  node  3 
would  be  loopfree  for  individual  messages,  but  not  loopfree  for  each  destination. 
See  [12]  for  a more  complete  discussion  of  loop  freedom. 

IX.  THE  ALGORITHM 

The  algorithm  is  described  in  several  steps . »e  first  present  the 
operations  to  be  performed  at  a node  in  "normal"  conditions,  when  no  topological 
changes  occur.  Then  we  describe  the  addition  to  the  algorithm  in  case  of 
failures  and  finally  the  protocols  for  adding  a link  to  the  network.  After 
the  various  features  of  the  algorithm  are  introduced  and  explained,  ve  proceed 
to  present  the  formal  algorithm  for  each  node  in  the  network. 

Informal  Description  of  the  Algorithm 

The  algorithm  proceeds  independently  for  each  destination.  Conse- 
quently, for  the  rest  of  the  paper  ve  fix  the  destination  and  present  and 
analyse  the  algorithm  for  that  giver,  destination,  vhith  is  denoted  by  SINK. 

In  normal  conditions,  each  node  i in  the  network  has  a routing  table  for 
this  destination  consisting  of  a preferred  neighbor  ?. , a node  counter 
number  and  an  estimated  distance  d^  tc  the  destination.  Essentially, 

Che  algorithm  is  intended  to  update  or  establish  these  quantities  at  each  step. 


! 
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eliminate  links  or  nodes  that  have  lefc  che  network.  In  addition  to  the 

quantities  p^,  , a node  i keeps  a list  of  its  current  neighbors, 

named  LIST^,  and  for  each  node  k £ LIST^  it  keeps  two  memory  locations 

% 

celled  N^(k)  and  D^(k),  intended  for  storage  of  messages  received  from  k. 

During  the  update  activity,  messages  with  format  (SINK,m,d)  are  transmitted 

between  neighbors,  where,  if  t is  the  sender,  then  a - n and  d * d . 

* 1 

(Since  we  are  looking  at  a particular  SINK,  we  shall  suppress  t;his  parameter 
from  now  on) . After  appending  the  identification  of  the  sender  to  each  re- 
ceived message,  the  receiving  node  puts  the  messages  in  a queue  and  processes 
them  one  by  one  on  a first-come-first-served  (FIFO)  basis.  We  say  that  a mes- 
sage is  received  at  node  i at  time  t,  if  the  processor  at  node  i starts 
processing  the  message  at  time  t.  As  a rule,  the  first  part  of  processing  of 
a message  (m,d)  received  at  i from  1 say,  consists  of  adding  to  d the 
current  distance  d^^  from  i to  l and  then  storing  m and  (d+  du)  in 
N^(k),  D^k)  respectively.  The  distance  can  be  the  estimated  delay 

over  the  link  (i,A)  (as  e.g.  in  ARPA) , the  estimated  incremental  delay  over 
(i,l)  as  required  in  [l],[2],  or  any  other  quantity  reflecting  the  routing 
criterion  of  interest.  This  quantity  must  be  positive,  but  can  be  estimated  in 
an  arbitrary  manner  at  each  node  i for  each  outgoing  link.  Procedures  for 
estimation  of  the  incremental  delay  are  indicated  in  [7],  (8]. 

An  update  cycle  is  started  when  the  SINK  sends  a message  (m,  d = 0)  to 
all  its  neighbors.  Let  us  look  now  at  an  arbitrary  node  i in  the  network, 
and  describe  its  procedure  under  "normal"  conditions,  namely  when  no  topologi- 
cal changes  occur.  A node  i collects  all  messages  it  receives  from  neighbors 
and  stores  them  as  described  above;  it  does  nothing  else  until  a message  is 
received  from  the  preferred  neighbor  p^.  At  this  poinc  the  node  enters  an 
"alert"  state  named  S2,  updates  its  as  the  minimum  of  all  D^Ck)  re- 

ceived up  to  ncv  during  the  present  update  cycle  and  sends  (the  updated)  d^ 

♦ r\  all  A rr>*  Vs  e * + » — — - ^ - — — • J - • J ’ v •*'  * * 
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operations,  the  node  continues  to  collect  and  store  messages,  until  message*  are 


received  from  ALL  its  neighbors.  At  this  point,  node  i sends  d^  to  its  pre- 
ferred neighbor  p^,  then  updates  p^  to  be  the  node  with  minimum  D^(k)  among 
all  neighbors,  erases  all  stored  values  N^(')  and  returns  to  the  "normal" state 
n smeil  SI.  The  idea  of  this  part  of  the  algorithm  is  that,  since;  as  veil  be  seen 
In  Section  III,  the  relation  (i,p^)  defines  a tree  in  the  netmork  in  normal  condi- 
tions, the  update  cycle  will  propagate  from  the  SINK  to  the  peripheries  while  up- 

0 

dating  d^  and  then  back  from  the  peripheries  to  the  SINK  while  updating  the  t'ree. 

The  transition  of  node  i from  state  S2  to  state  SI  signifies  completion  of 
the  update  step  for  node  i.  In  particular,  transition  from  S2  to  SI  of  the 
SINK  (the  SINK  enters  state  S2  when  starting  the  cycle,  means  completion  of 
the  update  cycle  by  the  entire  network. 

Until  now,  we  have  not  described  the  role  of  the  cycle  counter  number 
m and  of  the  node  counter  number  n^ . Indeed,  if  no  topological  changes  occur, 
there  is  no  need  for  these  numbers  as  long  as  the  SINK  starts  no  r.ev  update 
before  the  previous  one  has  been  completed.  It  is  easy  to  see  that  completion 
of  each  cycle  is  guaranteed  to  occur  in  finite  time  if  there  are  no  failures 
and  if  transmission  of  messages  over  any  link  takes  a finite  time.  The  formal 
statement  of  this  property  is  included  in  Theorem 

Next,  we  describe  the  additions  and  changes  to  the  basic  algorithm  for 
proper  treatment  of  topological  changes.  It  is  here  that  the  role  of  the 
cycle  and  node  counter  numbers  n and  n . becomes  apparent.  The  SINK  starts 
consecutive  update  cycles  with  nondecreasing  counter  numbers.  If  a cycle  is 
completed,  the  SINK  is  allowed  to  start  a new  cycle  -with  the  same  counter  number 
provided  that  a cycle  with  higher  number  has  not  beer,  previously  started.  On 
the  other  hand,  when  topological  changes  occur,  nodes  in  the  network  may  re- 
quest (by  a distributed  protocol  to  be  described  presently),  that  the  SINK  will 
start  a cycle  with  a counter  number  that  is  higher  than  e specified  number.  In 
.... • * *u~  oT»rv  v.,  — * - * o a-  mda*e  cvcle  before,  it  will 
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start  it  immediately  and  will  never  reduce  the  update  counter  numbers  afterwards. 


For  instance,  a sequence  of  starts  and  ends  of  update  cycles  with  their  appro- 


priate counter  numbers  may  be:  start  1,  end  1,  start  1,  start  2, 'end  2,  start  2, 


....  If  the  SINK  starts  a cycle  with  counter  nvnnber  m and  completes  it  before 


starting  a new  cycle,  we  say  that  there  has  been  a 


Lon.  We  denote 


the  time  of  proper  completion  of  a cycle  with  number  m by  PC(m). 


operations,  except  that  it  also  sets  n^^  *■  m and  stores  (m,d)  into 


( Ni ( Pi ) ,©i ( Pi ) ) , but  does  not  delete  p^  from  LIST^ . In  this  way,  the 


knowledge  of  the  failure  propagates  backwards  to  all  nodes  whose  best  paths 


to  the  SINK  passed  through  the  failed  link  or  node  and  to  their  neighbors, 


A node  i that  loses  its  preferred  neighbor  by  this  operation  goes  into 


state  S3  and  reattachment  (i.e.  establishing  a new  preferred  neighbor  p^) 
will  occur  as  soon  as  it  receives  a message  (m,d)  such  that  m > n^  and 


d < •.  If  at  the  time  it  enters  S3,  node  i has  already  received  such  a 


message,  it  reattaches  immediately . Reattachment  consists  of  choosing  as 


the  new  to  be  the  node  from  which  such  message  was  received,  going  to 


state  32  and  updating  • On  their  part,  the  neighbors  of  the  nodes  in  S3 


will  know  not  to  choose  such  nodes  as  their  preferred  neighbors 


Up  to  now,  we  have  described  the  algorithm  of  a node  i in  case 


failure  occurs  on  a link  (i,p.).  If  failure  occurs  on  a link  other  than 

l 


the  preferred  one,  this  link  is  erased  from  LIST^ , but  no  special  action  is 
needed  except  if  the  node  is  in  state  S2.  In  such  situations  it  may  happen 


(at  this  time  or  later)  that  the  node  will  receive  messages  from  all  the  re- 


H 


If  a node  i discovers  a failure  on  link.  (i,p^),  it  enters  a "listening'' 


state  S3,  sets  sends  (m  » n^,  d ■ •)  to  all  neighbors  except  p^,  sets 

nil  .and  deletes  the  neighbor  corresponding  to  the  failed  link  from  the  list 

of  neighbors  LIST^.  A node  i receiving  (m,  d « ®)  from  p1>  performs  similar 
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staining  neighbors  and  vill  complete  its  part  in  the  step  of  the  algorithm  by 

the  usual  transition  to  SI.  This  is  a situation  ve  vould  like  to  avoid,  because 
the  transition  from  S2  to  SI  is  supposed  to  signal  proper  completion  of  a step 
of  the  algorithm  by  the  corresponding  node  and  in  the  case  under  consideration 
the  node  vill  complete  the  step  because  it  lost  one  cf  its  neighbors  and  not  be- 
cause it  received  the  appropriate  signal  from  it.  Consequently  the  algorithm 
dictates  that  if  a node  i is  in  state  S2  and  discovers  a failure  on  1 i nka 
other  than  (i,p^),  the  node  vill  go  into  a "stagnated"  state  S2,  from  vhich 
it  vill  not  move  until  it  receives  a message  over  its  preferred  link.  In  this 
case,  it  vill  return  to  S2  and  vill  continue  the  algorithm  as  usual.  In  Section 
III  and  Appendices,  ve  shov  that  proper  advance  of  the  algorithm  after  this 
transition  is  guaranteed.  Another  possible  transition  is  from  state  S2  back  to 
state  S2.  This  happens  vhen  a node  is  in  state  S2  and  receives  a message  (m,d) 
vith  m large  enough  and  d<*  from  its  preferred  neighbor.  Prom  this  point 
6n,  the  node  vill  proceed  as  usual.  The  state  diagram  for  a node  is  depicted 
in  Figure  1. 


Fit;  ■■  1 - State  diagram  for  a node . 


A 


i 
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In  addition  to  the  above  operations , any  node  discovering  a failure 
on  any  of  the  links  connected  to  it  generates  a message  called*  REQ(n^). 
the  number  n^  is  exactly  the  node  counter  number  of  the  generating  node. 

A node  that  generates  or  receives  REQ(m),  sends  it  over  to  its  preferred 
neighbor  p^  if  + nil;  otherwise  it  destroys  the  message.  When  a 
message  REQ  arrives  at  a node,  it  is  put  in  the  regular  queue  and  processed 
according  to  FIFO  as  all  other  messages.  When  the  SINK  receives  REQ(m) 
it  starts  a nev  cycle  with  counter  number  (m+1),  provided  that  such  cycle 
has  not  been  started  previously.  Proposition  2 guarantees  that  if  a REQ(m) 
is  generated,  a cycle  with  counter  number  (m+1)  will  always  be  started 
within  finite  time . 

We  finally  describe  the  protocol  for  adding  components  to  the  net- 
work. A node  i comes  up  in  state  S3  with  counter  number  n^  * 0.  For 
bringing  a link  up  (i,k)  say,  its  two  ends  compare  their  node  mashers 
n^  and  n^  (via  a local  protocol)  and  decide  that  they  vill  bring  the  link 
up  with  a number  strictly  higher  than 

tj(k)  “ zfc(i)  = naxln^n^)  . (l) 

Also,  if  > n^,  then  i generates  REQ(n^).  The  link  is  finally  brought 
up  by  node  i when  n^  > z^(k)  0£  when  it  receives  from  k a message 

(m,d)  with  m > z^(k).  The  same  algorithm  bolds  for  k.  Bringing  link 
(i,k)  up  at  node  i consists  of  appending  k to  LISTi  and  opening  memory 
locations  N^(k),  D^(k).  In  the  formal  description  of  the  algorithm,  this 

is  done  in  B.l  and  SUBROUTINE  NEW.  Proposition  2 guarantees  that  a cycle 
with  counter  number  higher  than  x^(k)  will  be  started  in  finite  time  after 
the  REQ  message  is  generated. 
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Notations 

In  this  subsection,  we  present  several  notations  that  uill  be  used 
in  the  rest  of  the  paper.  The  notations  p^,  (s,d)»  n^,  d^,  H^k)*  D^k), 
t^(k),  LIST^ , SI,  S2,  S2,  S3,  PC(m),  have  been  introduced  already.  W-*  add 
the  time  in  parentheses  when  we  want  to  refer  to  the  above  quantities  ab,  a 
given  time  t;  for  example  p^(t),  IT(k)(t),  etc.  We  also  use  the  nota- 
tions : 

SXfn]  = state  SX  with  node  counter  number  n. 

s^(t)  = state  and  possibly  node  counter  number  n^  of  node  i at  time  t. 

Therefore  we  sometimes  write  s.(t)  ® S3  for  instance  and  sometimes 

1 

sjt)  = S3(n] . 

mx^(t)  = largest  counter  number  m received  up  to  time  t by  node  i. 

ADD”  = list  of  nodes  k neighboring  i such  that  link  (i,k)  is  ready 
to  be  added  to  the  network. 

We  use  a compact  notation  to  describe  changes  accompanying  a transition,  as 
follows : 

Txy[t,i,RECV(ml,dl,U),SEND(m2,d2,12),(nl,n2),(il,d2),(pl,p2),(mxl,nx2)] 

will  mean  that  transition  from  state  Sx  to  state  ~y  takes  place  at  time 

t at  node  i caused  by  receiving  (nl,di)  from  neighbor  tl;  in  this 

transition  i sends  ( r.2 , d? ) to  }.2,  changes  its  r.cde  counter  number  n^ 

from  nl  to  n2,  its  estimated  distance  to  destination  d.  from  dl  to 

I 

d2,  its  preferred  neighbor  p.  from  pi  to  p2  ar.i  the  largest  update 


j 

i 
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counter  number  received  up  to  now  mx^  from  mxl  to  mx2.  For  simplicity, 

ve  delete  all  arguments  that  are  of  no  interest  in  a given  description,  and 

% 

if  for  example  nl  is  arbitrary  ve  vrite  ($,n2)  instead  of  (nl,n2). 
Similarly,  if  one  of  thie  states  is  arbitrary,  ♦ vill  replace  this  state. 

In  particular  observe  that 

T*2[t,SIHK,(*,n2)l  (2) 

means  tnat  an  updating  cycle  vith  number  n2  is  started  at  time  t and 

T2l[t,SINK,(n2,n2) ] (3) 

means  that  proper  completion  of  the  cycle  occurs  at  time  t.  If  TjQr[t], 
then  ve  use  the  notations: 


- - * time  Just  before  the  transition  , 
t+  = time  Just  after  the  transition  . 

Ve  also  use 

[t,i,RECV(n,d,D  ] (U) 


to  denote  the  fact  that  a message  (m,d)  is  received  at  time  t at  i 
from  £,  vhetber  or  not  the  receipt  of  the  message  causes  a transition. 


Assumptions  and  Definitions 

Throughout  the  paper,  ve  assume  that  the  following  hold  everyvhe- s 
in  the  network. 

1.  All  links  are  bidirectional  (full  duplex). 

2.  dit(t)  > 0 for  link3  (itO  and  all  t. 


3.  If  a message  is  sent  by  node  i to  a neighbor  l,  then  in  finite 
time,  either  the  message  vill  be  received  correctly  at  l or 
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1 vill  declare  link  (i,i)  as  failed.  Notice  that  this  assumption 

does  not  preclude  transmission  errors  that  are  recovered 'by  a local 

link  protocol  (e.g.  "resend  and  acknowledgment" ) . ? r 

4 f 

U . Failure  of  a node  is  considered  as  failure  of  «-l  1 connected  to- it..  | 

- A node  i comes  up  in  state  33,  with  * 0 and  with  empty  tables. 

5.  A link  is  said  to  have  become  operational  as  soon  as  messages  can  be 

sent  both  ways  through  it.  ^ 

6.  A link  (i,k)  is  said  to  be  u£  if  i e LIST^  or  keLIST^,  or  both. 

7.  Two  nodes  are  said  to  be  physically  connected  at  time  t if  there 
..is  a sequence  of  links  that  are  up  at  time  t connecting  the  two 

nodes . 

8.  A message  is  said  to  arrive  at  node  i when  it  physically  arrives 
there.  A message  is  said  to  be  received  at  node  i,  when  the  message 
is  taken  from  the  queue  and  the  message  processor  starts  processing  it. 

9.  When  a message  (m,d)  is  received  at  a node,  the  possible  values  of 
d may  be  numerical,  d a ■ or  d * FAIL.  Ir.  the  following,  d < • 
means  d + « and  d / FAIL.  Also  D^(k)  < ® means  D^(k)  j4  • and  D^(k)  / FAIL. 

10.  The  local  protocols  for  discovering  failures  and  declaring  links  opera- 
tional are  arbitrary,  provided  they  possess  the  following  properties: 

(a)  If  node  i declares  link  (i,k)  to  be  failed  then  node  k declares 

link  (i,k)  to  be  failed  in  finite  time  or  node  i will  try  to  re- 
open the  link  in  finite  time.  ! 

(b)  If  node  k notices  that  node  i tries  to  bring  link  (i,k)  up 


while  node  k still  considers  the  link  operational,  node  k first 
declares  link  (i,k)  as  failed  (and  performs  step  A. 4 in  the 


- 13  - 


Formal  Description  of  the  Algorithm 

We  are  now  ready  to  display  the  formal  algorithm  performed  by  node  i. 

We  diride  the  description  into  three  parts;  the  off-line  operation*,  the 

% 

message  processor  and  the  finite-state  machine. 

The  off-line  operations  are  performed  independently  of  the  message 

processor  and  are  triggered  by  a message  arrival  to  the  node,  by  link  failures 

* 

detected  at  the  node  or  by  new  links  becoming  operational.  The  main  processor 
takes  the  message  at  the  head  of  the  queue  and  starts  processing  It.  The  main 
part  of  the  processing  consists  of  the  finite-state  machine  being  called  and 
zero,  one .or  more  transitions  taking  place.  As  soon  as  no  sore  transitions 
are  possible,  the  action  is  returned  to  the  message  processor. 

The  "F' cts"  given  in  the  algorithm  are  displayed  for  helping  in  its 
understanding  and  are  proven  in  Theorem  2. 

A.  Off-Line  Operations 

A.l  Compiling  the  list  of  newly  operational  links 
If  (i,t)  becomes  operational,  set 

zi(i)  max{ni,n£}  ; 

Append  (£,z^(i))  to  ADD^.  If  _>  n^ , generate  R£Q(n^)  and  put  it  in  queue 
(if  n^  • Qj,  it  is  enough  if  only  one  of  the  nodes  generates  the  REQ) . 

A. 2 Message  (m,d)  arrives  at  node  i from  node  i such  chat  l c LIST^  or  l c ADD^ 
Put  (m,d,£)  in  queue; 

A. 3 Message  REQ(m)  arrives  at  node  1 
Put  message  in  queue 

A. 4 Failure  of  link  (i,l)  detected  at  node  i 
Sec 

d * FAIL,  m «-  FAIL  ; 

Put  (m,d,l)  in  queue.  Generate  REQ(n^)  2nd  put  it  in  queue. 


! 


r v 
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B.  Operations  Done  by  the  Message  Processor  vhea  a Message  1b  Received 

’ t •> 

(i.e.,  vben  processor  at  i takes  this  message  from  the  t^ueui  and 
starts  processing  it } . 

If  the  message  is  REQ(m) , send  it  to  p^  if  p^  nil  and-deetro y 
it  if  p^  * nil. 

If  the  message  is  (m,d,l),  then: 

B.l  Open  nev  links: 

If  JLcADD^,  then  if  m > z^(i),  append  1 to  LIST^  and  delete 
It  from  ADDi . 

B.2  Execute 

If  d i FAIL,  d i «,  then  d * d + d^; 

Bi(t)  B. 

D^l)  - d; 

If  m ji  FAIL,  then  mx^  maxfm.mx^; 

EXECUTE  FINITE-STATE  MACHINE; 

IF  m * FAIL,  delete  t from  LIST^. 


C.  Finite-State  Machine 
State  SI 

T12  Condition  12  t = p, , d < •,  5 * nx^ . 


Fact  12 
Action  12 


m > n.  . 

— i 


min 


D.  (k); 

1 k:Ni(k)  = a 


D^k)  < 


ni  ♦ m; 


Vk  cADD4  if  n.  > z . (k) , CALL  "NEW  (k)"; 
i » * i 

transmit  (n^dj  to  neighbors.,  except  p^. 


V 
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T13  Condition  13  i * p.  and  (d  = • or  d * FAIL), 


Fact  12 


Action  1: 


If  m i FAIL,  then  a n^ . 

d • 
i 

If  d ^ FAIL,  then  n nj 
■^cADDi,  if  ci  > *i(kJ,  CALL  "HZV  (k)";  ' 
transmit  (n^.dj  to  neighbors,  except  p^; 
* nil. 


State  S2 


T21  Condition  21  & LIST^  N^k)  - ^ ■ mx1; 

3k  LIST^ , s.t.  Di(k)  <_  d^; 


Fact  21 


Action  21 


Di(pi)  < 

d + FAIL. 


< • . 


Transmit  (n^,d^)  to  p^; 
p^^  *■  k*  that  achieves  min  D^k); 
/k  eLIST^  set  K^k)  ♦ nil; 
exit  Finite-State  Machine. 


T22  Condition  22  t » p^,  d < "•  ® * > • 

Action  22  Same  as  Action  12. 


T22  Condition  22  t ^ p^  d 3 FAIL. 
Action  22  NONE . 


T23  Condition  23  Same  as  Condition  13- 


Fact  23 


Same  as  Fact  13 


Action  23  Same  as  Action  13. 
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State  S' 


T32  Condition  32  5k  e LI 5^  such  that  ex.-  = N,(k)  > 'ni,  D^k)  < •. 


Fact  32 


nil,  di  = ». 


Action  32  Let  k achieve 


Then  p.  <-  k ; 


ai  - nx.  ; 


k:Ni(k)  = =x. 


D.(k) 


D.U) 


d.  * D.(k#:; 


tfceADD.,  if  n.  > o.(k),  CALL  "NEW(k)‘\ 
Transmit  (n^d^)  to  neighbors,  except 


State  S2 


T22  Condition  22  t • p. 


Fact  22 


Action  22 


If  n !*  FAIL,  then  (n  >.  n,  and  if  a - n^  then  i * ») 


SUBROUTINE  "BEW(k) 11 

Append  k to  LIST, ; 
delete  k from  ADD. ; 
set  N^(k)  * r.ii. 

This  completes  the  description  of  the  algorithm  for  all  nodes  in  the 
network,  except  the  SINK.  The  SINK  performs  the  following  algoritho: 


0ff-Line_0£erations 

Same  as  for  all  other  nodes  and  ir.  addition,  if  s( 


SI,  the 


SINK  can  start  a new  update  cycle  at  any  tine  by  going  to  S2  and 
transmitting  d = O'  to  ail  neighbors. 


r 


[ 

f 

I 
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Operations  Done  by  the  Message  Processor  vhen  a Message  1»  Received 

If  the  message  is  REQ(m)  and  n^^  * a.  destroy  the  message. 

If  .the  message  is  KEQlm)  and  °gjjj£  i,  ®»  go  to  state  S2  and  start  a new 
cycle  as  follows: 

"SDK  * (*+1)l 

Vkt  AODsi[|K,  CALL  "BEUtA)"; 

tr““lt  USIIK"  “SIMC  * 0)  to  all  neighbors. 

If  the  message  is  (m,d,£),  d = FAIL,  then  delete  l from  LIST__W. 

SINK 

If  the  message  is  (m,d,l)^  d + FAIL,  then 

ITU)  +■  m; 

EXECUTE  FINITE-STATE  MACHINE  FOR  SINK. 


FINITE-STATE  MACHINE  FOR  SINK 
State  S2 

T21  Condition  21  Yk  e LISTSImf,  N.(k)  = n^^. 

Action  21  '"/k  e LISTgj^,  set  N^(k)  «■  nil. 

III.  Properties  and  Validation  of  the  Algorithm 

Some  of  the  properties  of  the  algorithm  have  already  been  indicated 
in  previous  sections.  Here  we  state  them  explicitly  along  with  some  of  the 
others . We  start  with  properties  that  held  throughout  the  operation  of  the 
network,  some  of  them  referring  to  the  entire  network  at  a given  instant  of 
time  and  some  to  a given  node  or  link  as  time  progresses.  Then  we  address 
Che  problem  of  recovery  of  the  network  after  topological  changes. 

At  a given  instant  t,  we  define  the  Routing  Graph  RG(t)  as  the 


directed  graph  whcs“  node;  are  the  network  nodes  and  whose  arcs  are  given  by 
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the  pointers  p^j  namely  there  is  &n  arc  from  node  i to  node  1 if  and 
only  if  pi(t)  * t.  In  order  to  describe  properties  of  the  BO(«t),  ve  also 
define  an  order  for  the  states  by  S3  > S2  * S2  > SI.  Also  Sx  >_ Sy  means 

sx  > sy  or  sx  • sy.  For  conceptual  purposes,  ve  regard  all  the  actions 
associated  vith  a transition  of  the  finite-state  machine  to  take  place  at 
the  Instant  of  the  transition. 

Theorem  1 

At  any  instant  of  time^  RG(t)  consists  of  a set  of  disjoint  trees 
vith  the  folloving  ordering  properties: 


i) 

the 

roots 

of 

the  trees  are  the  SINK  and  all  nodes  in  S3; 

11) 

If 

P^t) 

» 1 

} then  nt(t)  >_  n^t) ; 

ill) 

if 

Pi(t) 

« L 

and  nt(t)  ■ n^t),  then  s^t  > sjt); 

iv) 

if 

Pi(t) 

* 1 

and  nt(t)  » n^t)  and  s,(t)  » s.(t)  » SI 

d£(t.'  < i.  (t). 


The  proof  of  Theorem  1 is  given  in  [11].  According  to  It, 
the  RG  consists  at  any  time  of  a set  of  disjoint  trees  cr  equivalently,  it 
contains  nc  loops.  Observe  that  a tree  consisting  cf  a single  isolated  node 
is  possible.  The  algorithm  maintains  a certain  ordering  in  the  trees,  namely 
that  concatenation  of  [n;,s^)  is  nondecreasing  vhen  moving  from  the  leaves 
to  the  root  of  a tree  and  it.  addition,  for  notes  in  SI  ar.d  vith  the  same  node 
counter  number,  the  estimated  distances  to  the  SINK  are  strictly  decreasing. 

In  addition  to  properties  of  the  entire  network  at  each  instant  of 
time,  ve  tan  look  at  local  properties  as  time  progresses.  Some  of  the  nest 
important  are  giver,  in  the  folloving  theorem  whose  protf  appears  in  [11]. 




| 
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Theorem  2 

i)  For  a given  node  i,  the  node  counter  number  n^  is  nomdacreasing 
and  the  messages  (m,d)  received  from  a given  neighbor  have  non- 
decreasing numbers  m. 

ii)  Between  two  successive  proper  completions  ?C(m)  and  PC(a),  for  each 

• S 

given  m with  m <_  m <_  m*  each  node  sends  to  each  of  its  neighbors 
at  most  one  message  (m,d)  with  d < •». 

iii)  Between  two  successive  proper  completions  PC(m)  and  PC(m),  for  each 

given  m with  m £ m £ m,  a node  enters  each  of  the  sets  of  states 
{Sl[m]},  {S2[m],  S2[m]},  {S3[m]>  at  most  once. 

iv)  All  "Facts"  in  the  formal  description  of  the  algorithm  in  Section  II 
are  correct. 

A third  theorem  describes  the  situation  in  the  network  at  the  time 
proper  completion  occurs: 

Theorem  3 

At  PC(m),  the  following  hold  for  each  node  i: 

i)  If  ni  * m,  then  s^^  * SI  or  3^^  * S3. 

ii)  If  a message  (m,d)  with  d < • is  on  its  way  to  i,  then 

s ^ » S3  and  = m. 

/ _and  i w 

iii)  If  either  * i / s.  > SlJ  or  n^  < m,  then  for  all  k e LIS^ 
it  cannot  happen  that  {N^(k)  = m,  D^(k)  < 

A combined  proof  is  necessary  to  show  that  the  properties  appearing 
in  Theorems  1,  2,  3 hold.  The  proof  uses  a two-level  induction,  first  assum- 
ing properties  at  ?C  to  hold,  then  shewing  that  the  other  properties  hold 
between  this  and  the  next  PC  and  finally  proving  that  the  necessary  proper- 
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ties  hold  at  the  next  PC.  The  second  induction  level  proves  the  properties 

% 

between  successive  proper  completions  by  assuming  that  the  property  holds 
until  Just  before  the  current  time  t and  then  shoving  that  any  possible 
change  at  time  t preserves  the  property.  The  entire  rigorous  procedure 
appears  in  [11].  , 

In  order  to  introduce  properties  of  the  algorithm  regarding  normal 
activity  and  recovery  of  the  network,  we  need  the  following: 


Definition 

Consider  a given  time  t,  and  let  ml  be  the  highest  counter  number 
of  cycles  started  before  t.  We  say  that  a pertinent  topological  change 
happens  at  time  t if  a node  i with  n^(t-)  = ml  detects  at  time  t a 
failure  of  one  of  its  neighboring  links  or  observes  at  time  t that  an 
adjoining  link  became  operational.  In  other  words,  a pertinent  topological 
change  happens  at  time  t if  and  only  if  a message  REQ(ml)  is  generated 
at  time  t,  where  ml  is  the  largest  cycle  counter  number  available  at 
time-  t in  the  network. 


Theorem  L (Normal  activity) 
Let 


L(t)  = {nodes  physically  connected  to  SINK  at  time  t). 

t 

Suppose 


T*2[tl,  SINK,  (ml, ml)]  (5) 

namely  a cycle  is- started  at  tl  with  a number  that  was  previously  used. 

Suppose  also  that  no  pertinent  topological  changes  have  happened  while 

n = ml  before  tl  and  no  such  changes  happen  after  tl  for  long  enough 
SINK 

time.  Then  there  exist  tO,  t2,  t3  with  tO  < tl  < t2  < t3  < 00  such  that 
a) , b) , c J , d)  hold: 
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a)  T21  [tO,  SINK,  (ml, ml'];  (6) 

b)  Vte[t0,t3l,  we  have  L(t,  = L(tO); 

c>  for  all  i e L(tO)  we  have 

) 

T*2[t2.,  i,  (al.nl)]  (7) 

for  some  tine  t2^ c [tl,t2]; 

d)  i)  T21 [ 1 3 , SINK,  (ml, ml)];  (8) 

ii;  BG(r3)  for  all  nodes  in  L(tO)  is  a single  tree  rooted  at 
SINK. 

In  words,  Theorem  4 says  that  under  the  given  conditions,  if  a new 
cycle  starts  with  a number  that  was  previously  used,  then  PC  with  the  same 
number  has  previously  occurred  and  the  new  cycle  will  be  properly  completed 
in  finite  tine.  The  proof  of  Theorem  4 is  given  in  [11]. 

The  recovery  properties  of  the  algorithm  are  described  in  Proposi- 
tions 1,  2 and  in  Theorem  5.  The  proofs  of  the  propositions  appear  in  [11]. 

Proposition  1 

Let  L<t)  he  as  in  Theorem  4.  Suppose 

T*2[tl,  SINK,  (ml, m2)]  ; m2  > ml  , (9) 


namely  a cycle  starts  at  time  tl  with  a mmiber  that  was  not  previously 
used-  Suppose  also  that  no  pertinent  topological  changes  happen  for  a 
long  enough  period  after  tl . Then 
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a)  there  exists  a time  t2,  with  tl  <_  t2  < •,  such  that  for  i 
i e L(t2) 

T^2[t2i.i,U,m2)]  (10) 

happen  at  some  time  t2J  with  tl  < t2_,  < t2. 

i — i — 

b)  There  exists  a time  t3  < • such  that 

* 

i)  T2l[t3,SIIfK,(m2,m2))  ; (ll) 

ii)  RG(t3)  for  all  nodes  in  L(t3)  is  a single  tree  rooted  at  SINK. 
Part  of  a)  of  Proposition  1 says  that  under  the  stated  conditions, 
all  nodes  in  L(t)  will  eventually  enter  state  S2[m2].  Part  b)  says  that 
the  cycle  vill  be  properly  completed  and  all  nodes  physically  connected  to  the 
SISK  at  time  PC(m2)  will  also  be  connected  to  the  SIHK  by  the  Routing  Graph. 

Finally,  we  observe  that  reattachment  of  a node  loosing  its  path  to 
the  SIHK  or  bringing  a link  up  requires  a cycle  with  a counter  number  higher 
than  the  one  the  node  currently  has.  Proposition  2 ensures  that  such  a cycle 

has  been  or  will  be  started  in  finite  time  by  the  SINK. 

Proposition  2 

Suppose  that  a message  REQ(ml)  is  generated  at  some  time  t at 
some  node  in  the  network.  Then  the  SINK  has  received  before  t a message 

REQ(ml)  or  will  receive  such  a message  in  finite  time  after  t. 

Propositions  1 and  2 are  combined  in: 

Theorem  5 (Recovery  theorem) 

Let  L(t)  be  as  in  Theoren  U.  Suppose  there  is  a time  tl  after 
which  no  pertinent  topological  changes  happen  in  the  network  for  long  enough 

time.  Then  there  exists  a time  t3  with  tl  <_  t3  < x such  that  proper 
completion  happens  at  t3  and  such  that  all  nodes  in  L(t3)  are  on  a single 


trea-rooted  at  SINK . 


A 
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Proof 

Let  tO  <_  tl  be  the  time  of  the  last  pertinent  topological  change 
before  tl.  Let  i be  the  node  detecting  it  and  let  m = n^(tO-).  Then 
by  definition,  a message  RSQ(m)  is  generated  at  time  tO.  By  Proposi- 
tion 2,  a message  RZQ(m)  arrives  at  some  finite  time  at  SINK.  Let 
t2  < « be  the  time  the  first  REQ(m)  message  arrives  at  SINK.  The  algor- 
ithm dictates  that  SINK  vill  start  at  time  t2  a new  cycle,  vlth  number 
ml  = m+i.  Since  by  the  definition  of  pertinent  change,  m is  the  largest 
number  at  time  tO,  ve  have  that  tO  < t2.  By  assumption,  no  pertinent 
topological  changes  happen  after  time  tO  for  a long  enough  period,  so 
that  no  such  changes  happen  after  time  t2.  Consequently  Proposition  1 
holds  after  this  time  and  the  assertion  of  the  Theorem  follows . 

IV.  DISCUSSION  AND  CONCLUSIONS 

The  paper  presents  an  algorithm  for  constructing  and  maintaining 
loop-free  routing  tables  in  a data-netvork,  when  arbitrary  failures  and 
additions  happen  in  the  network.  Clearly,  these  properties  hold  also  for 
several  other  versions  of  the  algorithm,  some  of  them  simpler  and  some  of 
them  more  involved  than  the  present  one.  We  have  decided  on  the  present 
form  of  the  algorithm  as  a compromise  between  simplicity  and  still  keeping 
some  properties  that  are  intuitively  appealing.  Fcr  example,  one  possibility 
is  to  increase  the  update  cycle  number  every  time  a new  cycle  is  started.  This 
will  not  simplify  the  algorithm,  but  will  greatly  simplify  the  proofs.  On  the 
other  hand,  it  will  require  many  more  bits  for  the  update  cycle  and  node  num- 
bers m and  n^  than  the  algorithm  given  in  the  paper.  Another  version  of  the 
algorithm  previously  considered  by  us  was  to  require  that  every  time  a node 

k 


it  vj.ll  "forget"  all 


receives  a number  higher  than  from  seme  neigh: 

its  previous  information  and  will  "reattach"  to  that  node  immediately,  by  a 
similar  operation  to  transition  T32.  This  change  in  the  algorithm  would 
considerably  simplify .both  the  algorithm  and  the  proofs,  but  every  topologi- 
cal change  will  affect  the  entire  network,  since  after  any  topological  change 
all  nodes  will  act  as  if  they  had  no  previous  information.  On  the  other  ' 
hand,  the  version  given  in  the  paper  "localites"  failures  in  the  sense  that 
only  those  nodes  whose  best  path  to  SINK  was  destroyed  will  have  to  forget 
all  their  previous  information.  This  is  performed  in  the  algorithm  by  re- 
quiring that  nodes  not  in  S3  will  wait  for  a signal  from  the  preferred  neigh- 
bor p.  before  they  proceed,  even  if  they  receive  a number  higher  than  n. 
from  other  neighbors.  The  signal  may  oe  either  in  which  case  the  node 

enters  S3  (and  eventually  reattaches)  or  less  than  «,  in  which  case  the 
node  proceeds  as  usual. 
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